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Abstract 

We present an efficient and parsimonious algorithm to solve mixed initial/final- 
value problems. The algorithm optimally limits the memory storage and the com- 
putational time requirements : with respect to a simple forward integration, the cost 
factor is only logarithmic in the number of time-steps. As an example, we discuss 
the solution of the final-value problem for a Fokker-Planck equation whose drift 
velocity solves a different initial-value problem - a relevant issue in the context of 
turbulent scalar transport. 
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1 Introduction 



In the investigation of dynamical systems, the standard initial-value problem 
is to compute from the equations of motion the state of the system at a final 
time t, given its initial condition at time to- Sometimes, however, the state 
of the system might be known at a final time t, and one would be interested 
in evolving the system backward in time to compute its earlier states, back 
to to. This can in theory be easily accomplished by reversing the direction of 
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the time-integration, thus transforming the final-value problem in an initial- 
value one. Problems might however appear if the forward evolution is given 
by a mapping that is not one-to-one, as the previous state can thus become 
undefined. Even if the time evolution is given by a differential system, the 
backward evolution becomes unstable if the forward dynamics is dissipative, 
as is the case in many physical systems, like for instance Navier-Stokes tur- 
bulence. The problem stems from the fact that a dissipative system contracts 
volumes in phase space in the forward direction, and thus expands them in 
the backward direction and amplifies any small numerical errors, like those 
caused by roundoff. 

Another quite difficult task is to obtain the evolution of a system when part 
of the variables that specify the state are given at an initial time to and the 
remaining ones are given at the final time t. We refer to this class of problems 
as "mixed initial/final- value". We will be interested in a special subclass of 
such problems, that can be schematically written as follows 



du 

— = f(u,s), u(to)=u (1) 
dz 

— = g(z,u,s), z{t) = z t , (2) 

where u and z are vectors in a given space. Far from being an academic 
oddity, this problem is relevant to many physical situations, among which we 
will discuss in detail the transport of scalar fields by a dynamically evolving 
flow. Consider indeed the problem of finding the solution a(s) of the stochastic 
differential equation 

^- = v(a(s),s) + V2^r 1 (s) , (3) 



with the final value a(t) = x. 

Eq. (3) describes the evolution of a particle transported by the velocity field v 
and subject to molecular diffusion with diffusivity k, represented here by the 
zero-mean Gaussian process 77 with correlations (Vi(t)Vj(t')) — Sij5(t — t'). The 
velocity field v at any time s has to be obtained from some dynamical law 
(e.g. the Navier-Stokes equations) and from its initial value at s = t . It is 
easy to recognize that v plays the role of the variable u in Eqs. (1-2) whereas 
a has to be identified with z. An equivalent description may be given in terms 
of the transition probability P(y,s\x,t) - i.e. the probability that a particle 
is in y at time s given that it will be in x at time t. The propagator evolution 
is ruled by the well known Kolmogorov equation [1,2] 

- d s P(y, s\x, t)-V y - [v(y, s)P(y, s\x, t)] = k V 2 y P(y, s\x, t) , (4) 
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where the final condition is set : P(y,t\x,t) = 8{x — y). In the latter case, it 
is P that has to be interpreted as z in (2). 

In this article, we propose a fast and memory-sparing algorithm to solve the 
problem (1-2) or to allow one to go back through the time evolution if the 
dynamics is unstable or non-invertible. In Sec. 2 we describe in detail the 
algorithm, comparing it to more naive and less efficient strategies. In Sec. 3 
we present an application to the problem of front generation in passive scalar 
turbulence (see e.g. [3,4]). 



2 Backward Algorithm 

The obvious difficulty with Eqs. (1-2) resides in the fact that, since the initial 
conditions of u and z are set at different times, they cannot be evolved in 
parallel. Also, the time evolution of u(s) might be non-invertible or unstable 
in the backward time direction. The whole history of u(s) in the interval [to, t] 
is thus needed to integrate z(s) from time t back to time t . 

Before presenting our own algorithm, we wish to discuss some naive strate- 
gies to expose their shortcomings and advantages, and introduce notations. 
In the following, we will assume the whole time interval [to,t] to be dis- 
cretized in N identical time steps, small enough to ensure accurate integra- 
tion of Eqs. (1-2). The states u(s), z(s) thus have to be computed at the 
N + 1 times to, ... , tj = to + (t — t )j/N, . . . ,tw = t. In our applications, the 
states u(s) and z(s) will be rf-dimensional vector fields, numerically resolved 
with L d collocation points, and therefore have a size 0(dL d ), typically very 
large, that will be taken as unit of measure when describing the storage re- 
quirements B(N) of the different algorithms. The CPU time cost 7(N) will 
refer only to (forward) integrations of u and will be expressed in terms of the 
time to perform a single forward integration step. We will also give examples 
of memory use and CPU time for d — 2, £> = 1024 and N = 2 14 , which are 
typical values of moderately resolved direct numerical simulations in compu- 
tational fluid dynamics, requiring 16 MB of memory to store a single state 
array. 

The most obvious and simple strategy is the following : 

Al. integrate forward Eq. (1) from t to t N and store u(s) at all time 

steps to . . An ; 
A2. integrate backward Eq. (2) from tN back to t . 

The number of integration steps needed by this procedure is 7(N) = N, 
while the memory storage cost is a frightening §(N) = N. As soon as the 
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dimensionality of the space or the number of collocation points increase, this 
approach becomes rapidly unfeasible. Taking our typical fluid dynamics value, 
one would need 256 GB of memory, which is clearly irrealistic. 

A different strategy that minimizes the memory requirements is : 

Bl. set n <— TV and store the initial condition u ; 
B2. integrate forward Eq. (1) from t to t n ; 

B3. integrate backward Eq. (2) from t n to t„_i, update n <— n — 1, and go 
back to step B2 if n > 0. 

While this method is very advantageous in memory §>(N) = 1, it is pro- 
hibitively expensive because of the large number of iterations needed : T(N) = 
N(N + l)/2. With the previously given numerical parameters, one needs a 
daunting increase by a factor 8200 in CPU time with respect to algorithm A. 

To improve algorithm B, one can think of using more memory and a simple 
generalization goes as follows : 

CI. integrate forward Eq. (1) from to to and store the states u{s) at the 
M equidistant times r k = t^k/M, k — 0, . . . , M — 1 (we assume here N to 
be a multiple of M for convenience) ; 

C2. apply algorithm B successively in each segment [rfc,Tfc+i]. 

The number of operations is 7(N) = N(N + M) / (2M) remains however 
prohibitive unless we raise M to be O(N). Now, since the memory storage 
is §(N) — M, M cannot be made too large as well. Again refering to the 
numerical parameters given above, we have that for M — 16 the storage re- 
quirement is reasonably low (256 MB) yet the time factor with respect to 
algorithm A is a still discouraging 512. 

A further possibility which helps reducing the number of iterations and is 
almost reasonable for the memory storage needs is : 

Dl. integrate forward Eq. (1) from to to tjsr and store M states u(s) at 

times r fc = t Nk / M , k — 0, . . . , M — 1. Set k <— M — 1 ; 
D2. integrate forward Eq. (1) from t Nk / M to ijv(fc+i)/Af , using the stored state 

at Tfc as initial condition and saving the states u(s) at all time steps in a 

further set of N/M storage locations; 
D3. integrate backward Eq. (2) from tjv(fc+i)/M to tNk/M using the N/M saved u(s), 

update k <— k — 1 and go back to step D2 if k > 0. 

This procedure needs a reasonable total number of time steps 7(N) = N + 
(N - M)(M - I)/ M, that is 7{N) w 2N when N > M > 1, and is thus 
asymptotically linear in N, provided we have enough memory. The storage 
requirement is indeed rather large §(N) = M + N/M and is minimized for a 
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fixed N by taking M = y/N. With our typical parameters, we would have to 
take M = 128 and store 256 fields, amounting to roughly 4 GB, still too large 
for typical workstations. 

Algorithm D is still too greedy in memory, but it gives the idea of dividing the 
problem into smaller subproblems that have a much smaller running time, and 
that be combined later to give the full solution. If we push this idea further, 
we can build a recursive algorithm that integrates backward from to to by 
integrating forward from t to t/v/2, and using the states at to and t/v/2 to call 
itself successively in the intervals [tjv,tjv/2] and [tjv/ 2 ,to]. We have chosen here 
the subdivision base (the equivalent of M in the previous algorithm) to be 2, 
because it gives the simplest and one of the most efficient algorithms, but 
other bases could be used to slightly reduce the number of integration steps, 
at the price of using more storage. Of course, recursion can be eliminated and 
it is in its non- recursive form that we will describe our procedure. To do that, 
we will need a stack, that is a list of states to which we can add (push/save) 
a new item or remove (pull/delete) the last stored item. A stack can always 
be implemented as an array in programming languages that do not have it as 
a built-in type. We will also use the index [top] to refer to the (last pushed) 
element on top of the stack. Our algorithm is then very easy to state : 

Rl. set the desired time index n <— N and push the initial condition Uq on 
the stack ; 

R2. if the state on top of the stack does not correspond to the index n, 
set j <— (j[top] + n + l)/2 to the upper midpoint of the interval, integrate 
forward U[ top ] from t[ top ] to tj, push the state Uj on the stack and go back 
to step R2; 

R3. pull the state Ur top ] = u n from the stack, use it to integrate backward 
Eq. (2) from t n to t n _i, set n <— n — 1, and go back to step R2 if n > 0. 

To understand better the behavior of algorithm R, the easiest is to show an 
example of how it works in a simple case for a small value of N. Fig. 1 does 
this for N = 20, showing the stack movements at every time step. Even for 
such a small value of N, algorithm R needs 3.5 times less memory and is only 
2.1 times slower than algorithm A, while it is 5 times faster than algorithm B. 

It is obvious that our algorithm needs a very small amount of storage S(iV) = 
1+ |~log 2 (iV)] , that is only 15 fields or 240 MB for our typical example with N = 
2 14 . The computing time is also very reasonable: the computing time 7(N) 
obeys the recursions 7(N) = 2 7(N/2) + N/2 - 1 if JV is even and 7(N) = 
2 7([N/2\) + \N/2] if it is odd. The number of steps thus depends on the 
precise binary representation of N, but is given approximately by 7(N) m 
N\log 2 {N + 1)] /2 + 1 (equality being achieved if N is a power of 2), that is a 
cost factor that is only logarithmic in the number of time steps. In our same 
example, we find that we will need 7 times more integration steps than the 
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Fig. 1. Algorithm R in action for N = 20. The state of the stack is shown for every 
time tj at the beginning of label R3, where the state u(tj) becomes available on 
top of the stack. Steps is the number of forward integration steps needed for this 
particular time, while Total steps is the number of forward steps since the final 
time t 2 o to the current time. 

brute force algorithm A, but 1100 times less storage, so that the whole stack 
can be kept in-core during the backward integration. 



The algorithm we propose is thus quite efficient in computing time, and very 
economical in memory, opening the door to the study of the backward evo- 
lution of very large mult i- dimensional fields. To give an idea of possible ap- 
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plications, one might study the "seed" at to that gave birth to a particular 
structure observed at time t. As an example, we will discuss in the following 
section the numerical implementation and an application of this algorithm to 
scalar transport in turbulent flows. 



3 Scalar fields in turbulent flows 



The transport of scalar fields, such as temperature, pollutants and chemical 
or biological species advected by turbulent flows, is a common phenomenon of 
great importance both in theory and applications [3]. A scalar field, 9(x,t), 
obeys the advection-diffusion equation 

d t 9 + v ■ V9 = kV 2 9 + , (5) 



where k is the molecular diffusivity, v is the velocity field, and is the scalar 
input acting at a characteristic lengthscale L^. The presence of a scalar source 
allows for studying stationary properties. Thanks to the linearity of Eq. (5), 
the problem can be solved in terms of the particle propagator [3,4] 

t 

9(x, t) = jdsjdy P(y, s\x, t) <f>(y, s) , (6) 



as can be directly checked by inserting (6) in (5) and using (4). To make more 
intuitive the physical content of Eq. (6), we can rewrite it as 



9(x,t) = (Jds<f>(a(s),s)\ , (7) 

*0 ' a 

where (. . .) Q denotes the average over particle trajectories obeying (3) with 
a(t) = x. From (7) one understands that 8(x,t) is built up by the superposi- 
tion of the input along all trajectories ending at point x at time t. 

The velocity field evolves according to the Navier-Stokes equation : 

d t v + v - Vv = -Vp + vV 2 v + f . (8) 



Where the pressure p is fixed by the incompressibility condition (V • v — 0), v 
is the kinematic viscosity, and / the energy input. Notice that 9 does not enter 
the equation for the velocity field and therefore the scalar is called passive. 
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In the following we will consider a passive scalar field evolving in a two dimen- 
sional turbulent velocity field, and show how the numerical study of particle 
propagator conveys some information on the dynamical origin of structures in 
the scalar field. 

3.1 Numerical Implementation 

We integrate Eqs. (5), (8) and (4) in a doubly periodic box 2n x 2n with 
L x x L y grid points (the results here discussed are for L x = SL y = 1024) by 
a standard 2/3-dealiased pseudo-spectral method [5,6]. A detailed description 
of the properties of the velocity field in two-dimensional Navier-Stokes tur- 
bulence can be found in [7]. Here we only mention that the velocity field is 
self-similar with Kolmogorov scaling i.e. (v(x + r,t) — v(x,t)) ■ r/r ~ r 1 / 3 . 
However, the passive scalar increments 6(x + r) — 0{x) are not self-similar, 
since large excursions occur with larger and larger probability for increasingly 
small separations r (see, e.g., [8,9]). 

Time integration of Eqs. (5) and (8) is performed using a second order Runge- 
Kutta scheme modified to integrate exactly the dissipative terms. Both the 
velocity field and passive scalar were initialized to zero and integrated for 
a transient until a statistically stationary state was reached. The propaga- 
tor is initialized at the final time as a Gaussian P(y, t\x,t) = exp[— \x — 
y\ 2 /(2S 2 )]/(\^2n5), where the width 5 is of the order of few grid points The 
time evolution of Eq. (4) is implemented by a second-order Adams-Bashforth 
scheme modified to exactly integrate the dissipative terms. The adoption of 
different schemes for the forward and backward integration is motivated by the 
requirement of minimizing the use of Fast Fourier Transforms. To implement 
the backward algorithm it is also necessary to store the scalar and velocity 
forcings, and this is easily accomplished by including in the stored states the 
seed(s) of the pseudo-random number generator (s). 

The quality of the integration can be tested using the following relation 

s 

J ds> J dy P(y, s'\x, t) 0(y, s') = Jdy P(y, s\x,t) 0{y, s) , (9) 
o 

which stems from (4) and (5). In Fig. 2a we show both sides of (9), the quality 
of the integration is rather good. 

We have also performed Lagrangian simulations, i.e. we have integrated par- 
ticle trajectories evolving backward in time according to Eq. (3). For the in- 
tegration we used an Euler-Ito scheme, and the particle velocity has been 
obtained by means of a bilinear interpolation. In Fig. 2b we show the r.h.s 
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Fig. 2. (a) Jq ds' fdyP(y,s'\x,t)4>(y,s') and fdyP(y,s\x,t)6(y,s) as a function 
of time s. Time is expressed in unit of time steps (longer integration times display 
the same features). The difference is detectable only looking at absolute errors, here 
shown in the inset, (b) JdyP(y,s\x,t)0(y,s) obtained integrating the propagator 
and by integrating 10 6 particles initially distributed according to P(y, t\x, t). In the 
inset, the absolute error. 



of (9) evaluated with the propagator and with the Lagrangian trajectories the 
final condition of which have been set according to the propagator distribu- 
tion P(y, t\x, t). We recall that in the limit of infinite particles the propagator 
is exactly recovered. The good agreement of Fig. 2b reflects the fact that, al- 
though pseudo-spectral methods are not suited to preserve the positivity of the 
propagator, the presence of small negative regions is not severely penalizing. 
Indeed, a closer inspection of the propagator shows that the negative values 
are limited to small amplitude oscillations where P(y,s\x,t) is vanishingly 
small. 



3.2 Frontogenesis in passive scalar advection 



A striking and ubiquitous feature of passive scalar turbulence is the presence 
of fronts (also called "cliffs" or "sheets"), i.e. regions where the scalar has very 
strong variations separated by large regions ("ramps" or "plateaux") where 
scalar fluctuations are weak (see Fig. 3) [8,9,10,11,12,13,14,15,16]. 

The genesis of fronts and plateaux is best understood in terms of particle 
trajectories : to trace back the build-up of large and small scalar difference we 
study the evolution of the propagator 



X(y, s\x, x + r,t)= P(y, s\x + r,t)- P(y, s\x, t) , 



(10) 
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Fig. 3. Left : Typical snapshots of the scalar field 8. Note the presence of sharp fronts 
separating large regions in which the scalar assumes close values. Right : Close-up 
of a region containing a front. Across the upper and the middle spot there is a front, 
whereas the middle and lower one lie in a plateau. The distance between consecutive 
spots is larger than the diffusive scale L K , but smaller than the injection scale L^. 
In this simulation L K ~ 2 , ~ 170, the spot separation and diameter are ~ 25 
and « 15, respectively. Lengths are expressed in grid points. 



which is related to the scalar difference by the formula 

t 

6{x + r,t) - 9(x, t)= J ds J dy x(y, s\x,x + r, t) <f)(y, s) . (11) 

o 



Notice that \ evolves backward according to Eq. (4), with the final condition 
X(y, s\x,x + r,t) = 5(y - x - r) - 5(y - x). 

The numerical procedure was as follows. After the integration of Eqs. (5) 
and (8) over five eddy turnover times (the typical time-scale of large-scale 
motion) we choose x and r such that x, x + r are on a front or a plateau, 
respectively (see the right panel of Fig. 3). Then \ is integrated backward 
in time. In Fig. 4, we show four snapshots of the backward evolution of the 
field x- Already at a first glance the evolution of \ appears very different for 
the two final conditions : the blobs starting inside a plateau (first column) 
experience a strong mixing, while blobs lying initially across a front mix very 
poorly remaining compact and far aside. This is the basic mechanism for 
the formation of intense structures in passive scalar turbulence (for a related 
theoretical study see [17]). 
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Fig. 4. From top to bottom : backward evolution of x(y, s|a;, f) for x and r starting 
in a plateau (first column) and across a front (second column) , see the right panel of 
Fig. 3. Colors are coded according to the intensity of the field x, yellow is for positive 
values and blue for negative ones. At each time, the intensity is normalized according 
to the maximum of the fields in absolute value. The relatively smaller intensity on 
the first column is due to the fast mixing leading to strong cancellations between 
the positive and negative parts. Time is in eddy turnover times, the total number of 
time steps is 2 14 . To compare with figure 3, here the panel is 900 x 900 grid points. 
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